Part of a series on |
Peer-to-peer file sharing |
---|
Timeline |
Concepts |
Anonymous P2P Friend-to-friend Darknet Private P2P |
Networks and services |
Gnutella / Gnutella2 (G2) FastTrack · Kazaa eDonkey · BitTorrent Mininova · isoHunt The Pirate Bay |
By country or region |
Canada |
BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data. BitTorrent is one of the most common protocols for transferring large files, and it has been estimated that it accounts for roughly 27–55% of all Internet traffic (depending on geographical location) as of February 2009.[1]
Programmer Bram Cohen designed the protocol in April 2001 and released a first implementation on 2 July 2001.[2] It is now maintained by Cohen's company BitTorrent, Inc. There are numerous BitTorrent clients available for a variety of computing platforms.
Contents |
BitTorrent protocol allows users to distribute large amounts of data without the heavy demands on their computers that would be needed for standard Internet hosting. A standard host's servers can easily be brought to a halt if high levels of simultaneous data flow are reached. The protocol works as an alternative data distribution method that makes even small computers (e.g. mobile phones) with low bandwidth capable of participating in large data transfers.
First, a user playing the role of file-provider makes a file available to the network. This first user's file is called a seed and its availability on the network allows other users, called peers, to connect and begin to download the seed file. As new peers connect to the network and request the same file, their computer receives a different piece of the data from the seed. Once multiple peers have multiple pieces of the seed, BitTorrent allows each to become a source for that portion of the file. The effect of this is to take on a small part of the task and relieve the initial user, distributing the file download task among the seed and many peers. With BitTorrent, no one computer needs to supply data in quantities which could jeopardize the task by overwhelming all resources, yet the same final result—each peer eventually receiving the entire file—is still reached.
After the file is successfully and completely downloaded by a given peer, the peer is able to shift roles and become an additional seed, helping the remaining peers to receive the entire file. This eventual shift from peers to seeders determines the overall 'health' of the file (as determined by the number of times a file is available in its complete form).
This distributed nature of BitTorrent leads to a flood like spreading of a file throughout peers. As more peers join the swarm, the likelihood of a successful download increases. Relative to standard Internet hosting, this provides a significant reduction in the original distributor's hardware and bandwidth resource costs. It also provides redundancy against system problems, reduces dependence on the original distributor[3] and provides a source for the file which is generally temporary and therefore harder to trace than when provided by the enduring availability of a host in standard file distribution techniques.
A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.
To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent). This file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. Peers that want to download the file must first obtain a torrent file for it and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.
Though both ultimately transfer files over a network, a BitTorrent download differs from a classic download (as is typical with an HTTP or FTP request, for example) in several fundamental ways:
Taken together, these differences allow BitTorrent to achieve much lower cost to the content provider, much higher redundancy, and much greater resistance to abuse or to "flash crowds" than regular server software. However, this protection, theoretically, comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it may take time for a node to receive sufficient data to become an effective uploader. This contrasts with regular downloads (such as from an HTTP server, for example) that, while more vulnerable to overload and abuse, rise to full speed very quickly and maintain this speed throughout.
In general, BitTorrent's non-contiguous download methods have prevented it from supporting "progressive downloads" or "streaming playback". However, comments made by Bram Cohen in January 2007 suggest that streaming torrent downloads will soon be commonplace and ad supported streaming appears to be the result of those comments.
The peer distributing a data file treats the file as a number of identically sized pieces, usually with byte sizes of a power of 2, and typically between 32 KB and 4 MB each. The peer creates a hash for each piece, using the SHA-1 hash function, and records it in the torrent file. Pieces with sizes greater than 512 KB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol.[5] When another peer later receives a particular piece, the hash of the piece is compared to the recorded hash to test that the piece is error-free.[6] Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder.
The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent
. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients to verify the integrity of the data they receive.
Torrent files are typically published on websites or elsewhere, and registered with at least one tracker. The tracker maintains lists of the clients currently participating in the torrent.[6] Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. Azureus was the first BitTorrent client to implement such a system through the distributed hash table (DHT) method. An alternative and incompatible DHT system, known as Mainline DHT, was later developed and adopted by the BitTorrent (Mainline), µTorrent, Transmission, rTorrent, KTorrent, BitComet, and Deluge clients.
After the DHT was adopted, a "private" flag—analogous to the broadcast flag—was unofficially introduced, telling clients to restrict the use of decentralized tracking regardless of the user's desires.[7] The flag is intentionally placed in the info section of the torrent so that it cannot be disabled or removed without changing the identity of the torrent. The purpose of the flag is to prevent torrents from being shared with clients that do not have access to the tracker. The flag was requested for inclusion in the official specification in August, 2008, but has not been accepted.[8] Clients that have ignored the private flag were banned by many trackers, discouraging the practice.[9]
Users browse the web to find a torrent of interest, download it, and open it with a BitTorrent client. The client connects to the tracker(s) specified in the torrent file, from which it receives a list of peers currently transferring pieces of the file(s) specified in the torrent. The client connects to those peers to obtain the various pieces. If the swarm contains only the initial seeder, the client connects directly to it and begins to request pieces.
Clients incorporate mechanisms to optimize their download and upload rates; for example they download pieces in a random order to increase the opportunity to exchange data, which is only possible if two peers have different pieces of the file.
The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data. Clients may prefer to send data to peers that send data back to them (a tit for tat scheme), which encourages fair trading. But strict policies often result in suboptimal situations, such as when newly joined peers are unable to receive any data because they don't have any pieces yet to trade themselves or when two peers with a good connection between them do not exchange data simply because neither of them takes the initiative. To counter these effects, the official BitTorrent client program uses a mechanism called “optimistic unchoking”, whereby the client reserves a portion of its available bandwidth for sending pieces to random peers (not necessarily known good partners, so called preferred peers) in hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.[10]
The community of BitTorrent users frowns upon the practice of disconnecting from the network immediately upon success of a file download, and encourages remaining as another seed for as long as practical, which may be days – especially when there are a lot of downloading peers and when the ratio of seeders to downloading peers is low.
A growing number of individuals and organizations are using BitTorrent to distribute their own or licensed material. Independent adopters report[11] that without using BitTorrent technology and its dramatically reduced demands on their private networking hardware and bandwidth, they could not afford to distribute their files.
CableLabs, the research organization of the North American cable industry, estimates that BitTorrent represents 18% of all broadband traffic.[23] In 2004, CacheLogic put that number at roughly 35% of all traffic on the Internet.[24] The discrepancies in these numbers are caused by differences in the method used to measure P2P traffic on the Internet.[25]
Routers that use NAT, Network address translation, must maintain tables of source and destination IP addresses and ports. Typical home routers are limited to about 2000 table entries while some more expensive routers have larger table capacities. BitTorrent frequently contacts 300–500 servers per second rapidly filling the NAT tables. This is a common cause of home routers locking up.[26]
The BitTorrent protocol provides no way to index torrent files. As a result, a comparatively small number of websites have hosted a large majority of torrents, many linking to copyrighted material without the authorization of copyright holders, rendering those sites especially vulnerable to lawsuits.[27] Several types of websites support the discovery and distribution of data on the BitTorrent network.
Public torrent hosting sites such as The Pirate Bay allow users to search and download from their collection of torrent files. Users can typically also upload torrent files for content they wish to distribute. Often, these sites also run BitTorrent trackers for their hosted torrent files, but these two functions are not mutually dependent: a torrent file could be hosted on one site and tracked by another, unrelated site.
Private host/tracker sites operate like public ones except that they restrict access to registered users and keep track of the amount of data each user uploads and downloads, in an attempt to reduce leeching.
Search engines allow the discovery of torrent files that are hosted and tracked on other sites; examples include Mininova, BTJunkie, Torrentz, The Pirate Bay, Eztorrent and isoHunt. These sites allow the user to ask for content meeting specific criteria (such as containing a given word or phrase) and retrieve a list of links to torrent files matching those criteria. This list can often be sorted with respect to several criteria, being relevance (seeders-leechers ratio) one of the most popular and useful (due to the way the protocol behaves, the download bandwidth achievable is very sensitive to this value). Bram Cohen launched a BitTorrent search engine on http://search.bittorrent.com that co-mingles licensed content with search results.[28] Metasearch engines allow one to search several BitTorrent indices and search engines at once.
Although swarming scales well to tolerate flash crowds for popular content, it is less useful for unpopular content. Peers arriving after the initial rush might find the content unavailable and need to wait for the arrival of a seed in order to complete their downloads. The seed arrival, in turn, may take long to happen, since maintaining seeds for unpopular content entails high bandwidth and administrative costs, which runs counter to the goals of publishers that value BitTorrent as a cheap alternative to a client-server approach. A strategy adopted by many publishers which significantly increases availability of unpopular content consists of bundling multiple files in a single swarm.[29]
BitTorrent does not offer its users anonymity. It is possible to obtain the IP addresses of all current and possibly previous participants in a swarm from the tracker. This may expose users with insecure systems to attacks.[10] It may also expose users to the risk of being sued, if they are distributing files without permission from the copyright holder(s). However, there are ways to promote anonymity; for example, the OneSwarm project layers privacy-preserving sharing mechanisms on top of the original BitTorrent protocol.
A BitTorrent user may often choose to leave the swarm as soon as they have a complete copy of the file they are downloading, freeing up their outbound bandwidth for other uses. If enough users follow this pattern, torrent swarms gradually die out, meaning a lower possibility of obtaining older torrents (see content unavailability above). Some BitTorrent websites have attempted to address this by recording each user's download and upload ratio for all or just the user to see, as well as the provision of access to newer torrent files to people with better ratios. Users who have low upload ratios may see slower download speeds until they upload more. This prevents (statistical) leeching, since after a while they become unable to download at even a fraction of the theoretical bandwidth of their connection. Some trackers exempt dial-up users from this policy, because their uploading capabilities are limited. The BitTorrent protocol also attempts to minimize the damages of leeches by using only a portion of their bandwidth for one-directional trades and using the majority for two-directional trades that tend to help the swarm as a whole.
There are "cheating" clients like BitThief which claim to be able to download without uploading. Such exploitation negatively affects the cooperative nature of the BitTorrent protocol, although it might prove useful for people in countries where unauthorized uploading of copyrighted material is illegal, but downloading is not.
Average BitTorrent download speed is usually the sum of that peer's upload speed and a fair share of the total upload of all the "seeders in the swarm" (peers logged with the tracker that have a complete copy of the file). The 'tit-for-tat' style file sharing of downloading peers is responsible for the portion of the available download that's the same as the peer's upload. The seeders attempt to provide fair shares by scattering pieces to a wide selection of the best performing peers.
ISPs often provide asymmetrical Internet connections, with much higher download than upload speeds. Since a peer can only download data that's been uploaded by another peer this asymmetry is suboptimal for the bittorrent protocol. This performance issue is most obvious during the early life of a swarm when there is only one peer that is seeding (has a complete copy of the file) and all the other peers have exactly the same portion of the file as each other. When you initially join such a swarm you will get very high download speeds as every other peer optimistically sends you pieces in the hope that you have something to send them. This will probably continue until the time your peer catches up with the rest of the swarm when your average download speed drops to exactly the same as the upload speed of the seeder. The data is uploaded by the seeder to one peer and that peer passes it on down the line to the next in the swarm and so forth to everyone in the swarm.
If all peers in the swarm have symmetrical connections the swarm becomes far more stable. During the initial startup the swarm will be less able to draw new arrivals to the current maximum level of the swarm so the "everybody becomes a seeder" threshold is much less of an instant switch and more of a controlled cascade. The balance between the upload and the download also means that the majority of a peer's download is as a result of the 'tit-for-tat' file sharing which reduces the cost of seeding to a swarm by forcing the natural ratio of a peer closer to the overall 1:1 requirement of the swarm as a whole. If this reduction in 'seeder cost' were to happen in the wild it would probably result in much longer lived swarms too.
As symmetrical connections are uncommon swarms are normally in the "seeder starved" state where there is very little seeder upload bandwidth available and each peer gets about the same download as its upload. Additional upload bandwidth can be made available to a swarm through the use of "seed boxes" and "http seeds" but this is quite rare with public torrents.
The BitTorrent protocol is still under development and therefore may still acquire new features and other enhancements such as improved efficiency.
On May 2, 2005, Azureus 2.3.0.0 (now known as Vuze) was released,[30] introducing support for "trackerless" torrents through a system called the "distributed database." This system is a DHT implementation which allows the client to use torrents that do not have a working BitTorrent tracker. The following month, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client, which supported an alternative DHT implementation (popularly known as "Mainline DHT") that is incompatible with that of Azureus. Current versions of the official BitTorrent client, µTorrent, BitComet, and BitSpirit all share compatibility with Mainline DHT. Both DHT implementations are based on Kademlia.[31] As of version 3.0.5.0, Azureus also supports Mainline DHT in addition to its own distributed database through use of an optional application plugin.[32] This potentially allows the Azureus client to reach a bigger swarm.
Another idea that has surfaced in Vuze is that of virtual torrents. This idea is based on the distributed tracker approach and is used to describe some web resource. Currently, it is used for instant messaging. It is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers.
Most BitTorrent clients also use Peer exchange (PEX) to gather peers in addition to trackers and DHT. Peer exchange checks with known peers to see if they know of any other peers. With the 3.0.5.0 release of Vuze, all major BitTorrent clients now have compatible peer exchange.
Web seeding was implemented in 2006 as the ability of BitTorrent clients to download torrent pieces from an HTTP source in addition to the swarm. The advantage of this feature is that a site may distribute a torrent for a particular file or batch of files and make those files available for download from that same web server; this can simplify long term seeding and load balancing through the use of existing, cheap, web hosting setups. In theory, this would make using BitTorrent almost as easy for a web publisher as creating a direct http download, in addition it would allow the 'web seed' to be disabled if the swarm becomes too popular while still allowing the file to be readily available.
This feature has two specifications.
The first was created by John "TheSHAD0W" Hoffman, who created BitTornado.[33] From version 5.0 onward the Mainline BitTorrent client also supports web seeds and the BitTorrent web site had[34] a simple publishing tool that creates web seeded torrents.[35] µTorrent added support for web seeds in version 1.7. BitComet added support for web seeds in version 1.14. This first specification requires running a web service that serves content by info-hash and piece number, rather than filename.
The other specification can rely on a basic HTTP download space.[36]
A technique called Broadcatching combines RSS with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a column for Ziff-Davis in December, 2003.[37] The discussion spread quickly among bloggers (Ernest Miller,[38] Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained:
I want RSS feeds of BitTorrent files. A script would periodically check the feed for new items, and use them to start the download. Then, I could find a trusted publisher of an Alias RSS feed, and 'subscribe' to all new episodes of the show, which would then start downloading automatically — like the 'season pass' feature of the TiVo.—Scott Raymond, scottraymond.net[39]
The RSS feed will track the content, while BitTorrent ensures content integrity with cryptographic hashing of all data, so feed subscribers will receive uncorrupted content.
One of the first and popular software clients (free and open source) for broadcatching is Miro. Other free software clients such as PenguinTV and KatchTV are also now supporting broadcatching.
The BitTorrent web-service MoveDigital has the ability to make torrents available to any web application capable of parsing XML through its standard REST-based interface.[40] Additionally, Torrenthut is developing a similar torrent API that will provide the same features, as well as further intuition to help bring the torrent community to Web 2.0 standards. Alongside this release is a first PHP application built using the API called PEP, which will parse any Really Simple Syndication (RSS 2.0) feed and automatically create and seed a torrent for each enclosure found in that feed.[41]
Since BitTorrent makes up a large proportion of total traffic, some ISPs have chosen to throttle (slow down) BitTorrent transfers to ensure network capacity remains available for other uses. For this reason, methods have been developed to disguise BitTorrent traffic in an attempt to thwart these efforts.[42]
Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. At the moment Vuze, Bitcomet, KTorrent, Transmission, Deluge, µTorrent, MooPolice, Halite, rTorrent and the latest official BitTorrent client (v6) support MSE/PE encryption.
In September 2006 it was reported that some software could detect and throttle BitTorrent traffic masquerading as HTTP traffic.[43]
Reports in August 2007 indicated that Comcast was preventing BitTorrent seeding by monitoring and interfering with the communication between peers. Protection against these efforts is provided by proxying the client-tracker traffic via an encrypted tunnel to a point outside of the Comcast network.[44] Comcast has more recently called a 'truce' with BitTorrent, Inc. with the intention of shaping traffic in a protocol-agnostic manner.[45] Questions about the ethics and legality of Comcast's behavior have led to renewed debate about Net neutrality in the United States.[46]
In general, although encryption can make it difficult to determine what is being shared, BitTorrent is vulnerable to traffic analysis. Thus even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading but only uploading data, and terminate its connection by injecting TCP RST (reset flag) packets.
Another unofficial feature is an extension to the BitTorrent metadata format proposed by John Hoffman[47] and implemented by several indexing websites. It allows the use of multiple trackers per file, so if one tracker fails, others can continue supporting file transfer. It is implemented in several clients, such as BitComet, BitTornado, BitTorrent, KTorrent, Transmission, Deluge, µTorrent, rtorrent and Vuze. Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving to the next tier if all the trackers in the top tier fail.
Torrents with multiple trackers[48] can decrease the time it takes to download a file, but also has a few consequences:
Even with distributed trackers, a third party is still required to find a specific torrent. This is usually done in the form of a hyperlink from the website of the content owner or through indexing websites like The Pirate Bay or Torrentz.
The Tribler BitTorrent client is the first to incorporate decentralized search capabilities. With Tribler, users can find .torrent files that are hosted among other peers, instead of on a centralized index sites. It adds such an ability to the BitTorrent protocol using a gossip protocol, somewhat similar to the eXeem network which was shut down in 2005. The software includes the ability to recommend content as well. After a dozen downloads the Tribler software can roughly estimate the download taste of the user and recommend additional content.[50]
In May 2007 Cornell University published a paper proposing a new approach to searching a peer-to-peer network for inexact strings,[51] which could replace the functionality of a central indexing site. A year later, the same team implemented the system as a plugin for Vuze called Cubit[52] and published a follow-up paper reporting its success.[53]
A somewhat similar facility but with a slightly different approach is provided by the BitComet client through its "Torrent Exchange"[54] feature. Whenever two peers using BitComet (with Torrent Exchange enabled) connect to each other they exchange lists of all the torrents (name and info-hash) they have in the Torrent Share storage (torrent files which were previously downloaded and for which the user chose to enable sharing by Torrent Exchange).
Thus each client builds up a list of all the torrents shared by the peers it connected to in the current session (or it can even maintain the list between sessions if instructed). At any time the user can search into that Torrent Collection list for a certain torrent and sort the list by categories. When the user chooses to download a torrent from that list, the .torrent file is automatically searched for (by info-hash value) in the DHT Network and when found it is downloaded by the querying client which can after that create and initiate a downloading task.
The GitTorrent Protocol (GTP)[55] is, as of 2008, an alpha-version of a protocol designed for collaborative git repository distribution across the Internet.
An as-yet (2 February 2008) unimplemented unofficial feature is Similarity Enhanced Transfer (SET), a technique for improving the speed at which peer-to-peer file sharing and content distribution systems can share data. SET, proposed by researchers Pucha, Andersen, and Kaminsky, works by spotting chunks of identical data in files that are an exact or near match to the one needed and transferring these data to the client if the 'exact' data are not present. Their experiments suggested that SET will help greatly with less popular files, but not as much for popular data, where many peers are already downloading it.[58] Andersen believes that this technique could be immediately used by developers with the BitTorrent file sharing system.[59]
As of December 2008, BitTorrent, Inc. is working with Oversi on new Policy Discover Protocols that query the ISP for capabilities and network architecture information. Oversi's ISP hosted NetEnhancer box is designed to "improve peer selection" by helping peers find local nodes, improving download speeds while reducing the loads into and out of the ISP's network.[60]
There has been much controversy over the use of BitTorrent trackers. BitTorrent metafiles themselves do not store file contents. Whether the publishers of BitTorrent metafiles violate copyrights by linking to copyrighted material without the authorization of copyright holders is controversial.
Various jurisdictions have pursued legal action against websites that host BitTorrent trackers. High-profile examples include the closing of Suprnova.org, Torrentspy, LokiTorrent, Mininova and OiNK.cd. The Pirate Bay torrent website, formed by a Swedish group, is noted for the "legal" section of its website in which letters and replies on the subject of alleged copyright infringements are publicly displayed. On 31 May 2006, The Pirate Bay's servers in Sweden were raided by Swedish police on allegations by the MPAA of copyright infringement;[61] however, the tracker was up and running again three days later.
HBO, in an effort to combat the distribution of its programming on BitTorrent networks, has sent cease and desist letters to the Internet Service Providers of BitTorrent users. Many users have reported receiving letters from their ISPs that threatened to cut off their Internet service if the alleged infringement continues.[62] HBO, unlike the RIAA, has not been reported to have filed suit against anyone for sharing files as of April 2007. In 2005 HBO began "poisoning" torrents of its show Rome, by providing bad chunks of data to clients.[63] BitTorrent clients will eventually realize that the data is corrupted, but this does make it take longer to download.
On 23 November 2005, the movie industry and BitTorrent Inc. CEO Bram Cohen, signed a deal they hoped would reduce the number of unlicensed copies available through bittorrent.com's search engine, run by BitTorrent, Inc. It meant BitTorrent.com had to remove any links to unlicensed copies of films made by seven of Hollywood's major movie studios.
More recently, the BitTorrent network has been subject to scrutiny by the British Phonographic Industry (BPI). There are suggestions that they are using the network to obtain the IP addresses of those currently connected to the tracker. The information is then used to contact the ISP of each downloader so that notifications can be made (this was given sizeable coverage in the UK press with regard to Virgin Media sending letters out to customers suspected of using P2P networks).
There are two major differences between BitTorrent and many other peer-to-peer file-trading systems, which advocates suggest make it less useful to those sharing copyrighted material without authorization. First, BitTorrent itself does not offer a search facility to find files by name. A user must find the initial torrent file by other means, such as a web search. Second, BitTorrent makes no attempt to conceal the host ultimately responsible for facilitating the sharing: a person who wishes to make a file available must run a tracker on a specific host or hosts and distribute the tracker address(es) in the .torrent file. Because it is possible to operate a tracker on a server that is located in a jurisdiction where the copyright holder cannot take legal action, the protocol does offer some vulnerability that other protocols lack. It is far easier to request that the server's ISP shut down the site than it is to find and identify every user sharing a file on a peer-to-peer network. However, with the use of a distributed hash table (DHT), trackers are no longer required, though often used for client software that does not support DHT to connect to the stream.
|
|
|